Exploratory Subgroup Identification with ForestSearch
German Breast Cancer Study Group (GBSG) Analysis
Author
Larry F. León
Published
January 30, 2026
1 Introduction
This vignette demonstrates the ForestSearch methodology for exploratory subgroup identification in survival analysis, as described in León et al. (2024) Statistics in Medicine.
1.1 Motivation
In clinical trials, particularly oncology, subgroup analyses are essential for:
Evaluating treatment effect consistency across patient populations
Identifying subgroups where treatment may be detrimental (harm)
Characterizing subgroups with enhanced benefit
Informing regulatory decisions and clinical practice
While prespecified subgroups provide stronger evidence, important subgroups based on patient characteristics may not be anticipated. ForestSearch provides a principled approach to exploratory subgroup identification with proper statistical inference.
1.2 Methodology Overview
ForestSearch identifies subgroups through:
Candidate factor selection: Using LASSO and/or Generalized Random Forests (GRF)
Exhaustive subgroup search: Evaluating all combinations up to maxk factors
Bootstrap bias correction: Adjusting for selection-induced optimism
Cross-validation: Assessing algorithm stability
The key innovation is the splitting consistency criterion: a subgroup is considered “consistent with harm” if, when randomly split 50/50 many times, both halves consistently show hazard ratios ≥ 1.0 (for example if 1.0 represents a meaningful “harm threshold”).
2 Setup
2.1 Load Required Packages
Show code
library(forestsearch)library(survival)library(data.table)library(ggplot2)library(gt)library(grf)library(policytree)library(doFuture)library(doRNG)# Optional packages for enhanced outputlibrary(patchwork)library(weightedsurv)# Set ggplot themetheme_set(theme_minimal(base_size =12))
3 Data: German Breast Cancer Study Group Trial
3.1 Study Background
The GBSG trial evaluated hormonal treatment (tamoxifen) versus chemotherapy in node-positive breast cancer patients. Key characteristics:
Sample size: N = 686
Outcome: Recurrence-free survival time
Censoring rate: ~56%
Treatment: Hormonal therapy (tamoxifen) vs. chemotherapy
cat("Subgroup size:", sum(fs$df.est$treat.recommend ==0), sprintf("(%.1f%% of ITT)\n", 100*mean(fs$df.est$treat.recommend ==0)))
Subgroup size: 82 (12.0% of ITT)
ForestSearch identifies Estrogen ≤ 0 (ER-negative) as the subgroup with potential harm. This is biologically plausible: tamoxifen is a selective estrogen receptor modulator with limited efficacy in ER-negative tumors.
6 Bootstrap Bias Correction
6.1 Rationale
Cox model estimates from identified subgroups are upwardly biased due to the selection process (subgroups are selected because they show extreme effects). Bootstrap bias correction addresses this by:
Resampling with replacement
Re-running the entire ForestSearch algorithm
Computing bias terms from bootstrap vs. observed estimates
# Number of bootstrap iterations# Use 500-2000 for production; reduced here for vignetteNB <-1000t0 <-proc.time()fs_bc <-forestsearch_bootstrap_dofuture(fs.est = fs,nb_boots = NB,show_three =FALSE,details =TRUE)
Kaplan-Meier survival curves by identified subgroup
Note: Identified subgroup: {er <= 0}. HR(bc) = bootstrap bias-corrected hazard ratio. Medians [95% CI] for arms are un-adjusted.
6.4.1 Event Count Summary
Low event counts can lead to unstable HR estimates. This summary helps identify potential issues:
Show code
# note that default required minimum events is 12 for subgroup candidate# Here we evaluate frequency of subgroup candidates in bootstrap samples less than 15event_summary <-summarize_bootstrap_events(fs_bc, threshold =15)
=== Bootstrap Event Count Summary ===
Total bootstrap iterations: 1000
Event threshold: <15 events
ORIGINAL Subgroup H on BOOTSTRAP samples:
Control arm <15 events: 0 (0.0%)
Treatment arm <15 events: 0 (0.0%)
Either arm <15 events: 0 (0.0%)
ORIGINAL Subgroup Hc on BOOTSTRAP samples:
Control arm <15 events: 0 (0.0%)
Treatment arm <15 events: 0 (0.0%)
Either arm <15 events: 0 (0.0%)
NEW Subgroups found: 868 (86.8%)
NEW Subgroup H* on ORIGINAL data:
Control arm <15 events: 21 (2.4% of successful)
Treatment arm <15 events: 89 (10.3% of successful)
Either arm <15 events: 100 (11.5% of successful)
NEW Subgroup Hc* on ORIGINAL data:
Control arm <15 events: 0 (0.0% of successful)
Treatment arm <15 events: 0 (0.0% of successful)
Either arm <15 events: 0 (0.0% of successful)
6.4.2 Bootstrap Diagnostics
Show code
# Quality metricssummaries$diagnostics_table_gt
Bootstrap Diagnostics Summary
Analysis of 1000 bootstrap iterations
Category
Metric
Value
Success Rate
Total iterations
1000
Successful
868 (86.8%)
Failed
132 (13.2%)
Success rating
Good
Subgroup H (Questionable)
Observed HR
1.951
Bias-corrected HR
1.521
Bootstrap CV (%)
101.5%
N estimates
868
Subgroup Hc (Recommend)
Observed HR
0.615
Bias-corrected HR
0.644
Bootstrap CV (%)
48.5%
N estimates
868
6.4.3 Subgroup Agreement
How consistently does bootstrap identify the same subgroup?
Show code
# Agreement with original analysisif (!is.null(summaries$subgroup_summary$original_agreement)) { summaries$subgroup_summary$original_agreement}
Metric Value
<char> <char>
1: Total bootstrap iterations 1000
2: Successful iterations 868
3: Failed iterations (no subgroup) 132
4:
5: Original subgroup definition {er <= 0}
6: Exact match with original 91 (10.5%)
7: Different from original 777 (89.5%)
8: Partial match (shared factor) 109 (12.6%)
Show code
# Factor presence across bootstrap iterationsif (!is.null(summaries$subgroup_summary$factor_presence)) { summaries$subgroup_summary$factor_presence}
The solid black line denotes the ITT Kaplan-Meier treatment difference estimates along with 95%95% CIs (the grey shaded region). K-M differences corresponding to subgroups are displayed.
Show code
# # Core display of ITT and identified subgroups# plot_km_band_forestsearch(# df = df.analysis,# fs.est = fs,# outcome.name = outcome.name,# event.name = event.name,# treat.name = treat.name# )# Add additional subgroups along with ITT and identified subgroupsref_sgs <-list(age_young =list(subset_expr ="age < 65", color ="brown"),age_old =list(subset_expr ="age >= 65", color ="orange"))plot_km_band_forestsearch(df = df.analysis,fs.est = fs,ref_subgroups = ref_sgs,outcome.name = outcome.name,event.name = event.name,treat.name = treat.name,draws_band =1000)
Show code
# # Example with more subgroups# ref_sgs <- list(# pgr_positive = list(subset_expr = "pgr > 0", color ="green"),# pgr_negative = list(subset_expr = "pgr <= 0", color = "purple"),# age_young = list(subset_expr = "age < 65", color = "brown"),# age_old = list(subset_expr = "age >= 65", color = "orange")# )# plot_km_band_forestsearch(# df = df.analysis,# fs.est = fs,# ref_subgroups = ref_sgs,# outcome.name = outcome.name,# event.name = event.name,# treat.name = treat.name# )
The ForestSearch analysis identifies estrogen receptor-negative (ER ≤ 0) patients as a subgroup with potential lack of benefit from hormonal therapy.
Biological plausibility: Tamoxifen is a selective estrogen receptor modulator. Its efficacy depends on ER expression. The finding that ER-negative patients may not benefit is consistent with:
Mechanistic understanding of tamoxifen action
Meta-analyses showing no tamoxifen benefit in ER-negative breast cancer
Clinical guidelines recommending tamoxifen primarily for ER-positive tumors
Caveats:
This is an exploratory analysis requiring independent validation
The bias-corrected estimates have wider confidence intervals
Cross-validation metrics should be evaluated for algorithm stability
León LF, Jemielita T, Guo Z, Marceau West R, Anderson KM (2024). “Exploratory subgroup identification in the heterogeneous Cox model: A relatively simple procedure.” Statistics in Medicine. DOI: 10.1002/sim.10163